In [ ]:
import csv
We're going to use something called a with
statement to open a file and read the contents. The open()
function takes at least two arguments: The path to the file you're opening and what "mode" you're opening it in.
To start with, we're going to use the 'r'
mode to read the data. We'll use the default arguments for delimiter -- comma -- and we don't need to specify a quote character.
Important: If you open a data file in w
(write) mode, anything that's already in the file will be erased.
The file we're using -- MLB roster data from 2017 -- lives at data/mlb.csv
.
Once we have the file open, we're going to use some functionality from the csv
module to iterate over the lines of data and print each one.
Specifically, we're going to use the csv.reader
method, which returns a list of lines in the data file. Each line, in turn, is a list of the "cells" of data in that line.
Then we're going to loop over the lines of data and print each line. We can also use bracket notation to retrieve elements from inside each line of data.
In [ ]:
# open the MLB data file `as` mlb
with open('data/mlb.csv', 'r') as mlb:
# create a reader object
reader = csv.reader(mlb)
# loop over the rows in the file
for row in reader:
# assign variables to each element in the row (shortcut!)
name, team, position, salary, start_year, end_year, years = row
# print the row, which is a list
print(row)
In [ ]:
# open the MLB data file `as` mlb
with open('data/mlb.csv', 'r') as mlb:
# create a reader object
reader = csv.reader(mlb)
# move past the header row
next(reader)
# loop over the rows in the file
for row in reader:
# assign variables to each element in the row (shortcut!)
name, team, position, salary, start_year, end_year, years = row
# print the line of data ~only~ if the player is on the Twins
if team == 'MIN':
# print the row, which is a list
print(row)
In [ ]:
# open the MLB data file `as` mlb
with open('data/mlb.csv', 'r') as mlb:
# create a reader object
reader = csv.reader(mlb)
# move past the header row
next(reader)
# loop over the rows in the file
for row in reader:
# assign variables to each element in the row (shortcut!)
name, team, position, salary, start_year, end_year, years = row
# print the line of data ~only~ if the player is on the Twins
if int(salary) >= 1000000:
# print the row, which is a list
print(name, salary)
Sometimes it's more convenient to work with data files as a list of dictionaries instead of a list of lists. That way, you don't have to remember the position of each "column" of data -- you can just reference the column name. To do it, we'll use a csv.DictReader
object instead of a csv.reader
object. Otherwise the code is much the same.
In [ ]:
# open the MLB data file `as` mlb
with open('data/mlb.csv', 'r') as mlb:
# create a reader object
reader = csv.DictReader(mlb)
# loop over the rows in the file
for row in reader:
# print just the player's name (the column header is "NAME")
print(row['NAME'])
In [ ]:
# define the column names
COLNAMES = ['name', 'org', 'position']
# let's make a few rows of data to write
DATA_TO_WRITE = [
['Cody', 'IRE', 'Training Director'],
['Maggie', 'The New York Times', 'Reporter'],
['Donald', 'The White House', 'President']
]
# open an output file in write mode
with open('people-list.csv', 'w') as outfile:
# create a writer object
writer = csv.writer(outfile)
# write the header row
writer.writerow(COLNAMES)
# loop over the data and write to file
for human in DATA_TO_WRITE:
writer.writerow(human)
In [ ]:
# define the column names
COLNAMES = ['name', 'org', 'position']
# let's make a few rows of data to write
DATA_TO_WRITE = [
{'name': 'Cody', 'org': 'IRE', 'position': 'Training Director'},
{'name': 'Maggie', 'org': 'The New York Times', 'position': 'Reporter'},
{'name': 'Donald', 'org': 'The White House', 'position': 'President'}
]
# open an output file in write mode
with open('people-dict.csv', 'w') as outfile:
# create a writer object -- pass the list of column names to the `fieldnames` keyword argument
writer = csv.DictWriter(outfile, fieldnames=COLNAMES)
# use the writeheader method to write the header row
writer.writeheader()
# loop over the data and write to file
for human in DATA_TO_WRITE:
writer.writerow(human)
Sometimes you want to open multiple files at the same time. One thing you might want to do: Opening a file of raw data in read mode, clean each row in a loop and write out the clean data to a new file.
You can open multiple files in the same with
block -- just separate your open()
functions with a comma.
For this example, we're not going to do any cleaning -- we're just going to copy the contents of one file to another.
In [ ]:
# open the MLB data file `as` mlb
# also, open `mlb-copy.csv` to write to
with open('data/mlb.csv', 'r') as mlb, open('mlb-copy.csv', 'w') as mlb_copy:
# create a reader object
reader = csv.DictReader(mlb)
# create a writer object
# we're going to use the `fieldnames` attribute of the DictReader object
# as our output headers, as well
# b/c we're basically just making a copy
writer = csv.DictWriter(mlb_copy, fieldnames=reader.fieldnames)
# write header row
writer.writeheader()
# loop over the rows in the file
for row in reader:
# what type of object is `row`?
# how would we find out?
# write row to output file
writer.writerow(row)